Back

Computational and Structural Biotechnology Journal

American Association for the Advancement of Science (AAAS)

Preprints posted in the last 90 days, ranked by how well they match Computational and Structural Biotechnology Journal's content profile, based on 216 papers previously published here. The average preprint has a 0.30% match score for this journal, so anything above that is already an above-average fit.

1
Computational Design and Atomistic Validation of a High-Affinity VHH Nanobody Targeting the PI/RuvC Interface of Streptococcus pyogenes Cas9: A Bivalent Hub Strategy for CRISPR-Cas9 Enhancement

Kumar, N.; Dalal, D.; Sharma, V.

2026-03-25 bioinformatics 10.64898/2026.03.22.713495 medRxiv
Top 0.1%
27.2%
Show abstract

The CRISPR-Cas9 system has revolutionized genome engineering, yet its full therapeutic potential remains constrained by challenges in precisely modulating its activity and specificity. Here we report a fully computational end-to-end pipeline for the de novo design of a single-domain VHH nanobody (NbSpCas9-v1) targeting a structurally conserved, non-catalytic epitope at the PAM-interacting (PI) and RuvC-III interface of Streptococcus pyogenes Cas9 (SpCas9; PDB: 4UN3). Nanobody sequences were generated using BoltzGen, a generative diffusion binder design framework, and co-folded with SpCas9 using Boltz-2 to evaluate structural confidence and binding affinity. The top-ranked model (SpCas9_4UN3_Bivalent_Hub_v1) achieved a complex pLDDT of 0.8406, an aggregate score of 0.8016, and an ipTM of >0.8, indicating high confidence in the nanobody-antigen interface. The designed 1,616-residue quaternary complex (SpCas9 + sgRNA + DNA + nanobody) was subjected to 10 ns of all-atom molecular dynamics (MD) simulation using the AMBER14SB force field within the GROMACS/OpenMM framework. The complex stabilized at RMSD [~]6 [A] with a radius of gyration of 39-44 [A], confirming thermodynamic stability under physiological conditions (310 K, 0.15 M NaCl). A conserved 96.3 [A] inter-molecular distance between the nanobody centroid and the HNH catalytic residue H840 establishes NbSpCas9-v1 as a distal, non-inhibitory binder -- ideally suited for a Bivalent Hub architecture recruiting secondary effectors to the Cas9 ribonucleoprotein (RNP). The nanobody-Cas9 interface is stabilized by 8 hydrogen bonds, 4 salt bridges, and [~]1,850 [A]2 of buried solvent-accessible surface area. These results provide a rigorous structural and dynamic foundation for experimental validation of VHH-based CRISPR-Cas9 enhancers and modulators. GRAPHICAL ABSTRACTThe computational workflow proceeds from SpCas9 crystal structure acquisition (PDB: 4UN3) through BoltzGen nanobody design, Boltz-2 structural co-folding, 10 ns explicit-solvent MD validation, and Bivalent Hub functional characterization. The PyMOL rendering below shows the full quaternary complex at atomistic resolution.

2
Molecular basis of Salla Disease: R39C Mutation Effects on the Lysosomal Transporter Sialin

Matsingos, C.; Lot, I.; Vaz, M.; Mailliart, J.; Boulayat, M.; Debacker, C.; Goupil-Lamy, A.; Gasnier, B.; Acher, F. C.; Anne, C.

2026-04-22 biochemistry 10.64898/2026.04.20.719580 medRxiv
Top 0.1%
23.7%
Show abstract

Salla disease is caused by a genetic mutation in sialin, a lysosomal membrane transporter, which exports sialic acid from lysosomes. Substrate translocation occurs via a rocker-switch mechanism that alternately exposes the substrate-binding site to the lysosomal lumen and the cytosol. The pathogenic mutation R39C found in most Salla disease patients decreases the lysosomal localisation and the transport activity. In this study, we used computational and mutagenesis approaches to elucidate the molecular effects of the R39C mutation. Using three-dimensional models of human sialin in the lumen-open (LO) and cytosol-open (CO) states combined with the mutagenesis of selected residues, we identify a critical "triplet" motif comprising R39, E194, and E262, which is associated with an ionic lock formed between K197 and D350 in the LO conformation. Molecular dynamics simulations suggest that the electrostatic triplet negatively modulates the ionic lock, and are consistent with a strengthened ionic lock in R39C sialin, potentially favouring the LO state. To assess the global effects of the R39C mutation, we computed dynamic cross-correlation matrices and identified correlation patterns consistent with an allosteric coupling between the ionic lock K197/D350 and the region surrounding the sialic acid binding site in wild-type sialin, whereas in the LO state of R39C sialin, this communication preferentially bypasses this region. Therefore, the R39C mutation may impede the LO to CO conformational transition required for sialic acid transport, providing a plausible mechanistic framework for the decreased transport activity, and possibly the decreased lysosomal localisation, observed in Salla disease. HighlightsO_LIThe R39 residue participates in an interaction triplet, which negatively regulates an ionic lock stabilising the lumen-open conformation C_LIO_LIThe R39C mutation is associated with a stronger ionic lock in the simulations, and may favour the lumen-open state C_LIO_LICorrelation network analysis suggests an allosteric coupling between the ionic lock and the region surrounding the sialic acid binding site C_LIO_LIThe R39C mutation alters the inferred allosteric coupling between the ionic lock and the region surrounding the sialic acid binding site C_LI Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/719580v1_ufig1.gif" ALT="Figure 1"> View larger version (37K): org.highwire.dtl.DTLVardef@1ed0f72org.highwire.dtl.DTLVardef@913798org.highwire.dtl.DTLVardef@1d8e5adorg.highwire.dtl.DTLVardef@cf0060_HPS_FORMAT_FIGEXP M_FIG C_FIG

3
Introducing non-enzymatic crosslinks into atomistic simulations of collagen fibrils

Giannetti, G.; Pils, J.; Graeter, F.; Monego, D.; Dellago, C.

2026-03-16 bioinformatics 10.64898/2026.03.13.711566 medRxiv
Top 0.1%
18.7%
Show abstract

MotivationCollagen fibrils are the primary load-bearing units of connective tissues. However, generating atomistic, simulation-ready models remains challenging due to collagens hierarchical organization and the diversity of its crosslinking network across tissues, ages, and metabolic states. Notably, non-enzymatic advanced glycation end-product (AGE) crosslinks--central to aging and diabetic complications--are largely absent from current atomistic fibril modelling workflows. ResultsHere, we present an extension of the ColBuilder framework to generate atomistic collagen fibril models that incorporate three representative AGE-derived crosslinks (glucosepane, pentosidine, and MOLD) alongside enzymatic crosslinks. Amber99-compatible parameters are provided and assessed against QM-optimized reference geometries using all-atom molecular dynamics (MD) simulations. As proof-of-concept, we examine the mechanical response of single D-period collagen microfibrils featuring enzymatic-only, AGE-only, and mixed crosslink patterns in Molecular Dynamics simulations under force, and observe that AGE crosslinks differently impact the fibril structure compared to enzymatic crosslinks. The extension to ColBuilder can aid future structure-based research on collagen aging. Availability and implementationColBuilder is available as an open-source Python command-line package at https://github.com/graeter-group/colbuilder.

4
Interplay of the ribosome A and CAR sites

Raval, M.; Zhou, Y.; Lynch, M.; Krizanc, D.; Thayer, K.; Weir, M. P.

2026-04-09 systems biology 10.64898/2026.04.07.714784 medRxiv
Top 0.1%
18.4%
Show abstract

Protein translation is a highly regulated process influenced by multiple factors at the initiation, elongation, and termination stages. One notable regulatory element of the ribosome is the CAR interaction surface, a three-residue motif in the structure of the ribosome composed of C1274 and A1427 of S. cerevisiae 18S rRNA (corresponding to C1054 and A1196 in E. coli 16S rRNA) and R146 of ribosomal protein Rps3. CAR is highly conserved and positioned adjacent to the amino-acyl (A site) decoding center. It establishes hydrogen bonds with the +1 codon next in line to enter the ribosome A site, acting as an extension of the tRNA anticodon and forming base-stacking interactions with nucleotide 34 of the tRNA. However, despite CARs enzymatically strategic positioning within the ribosome, its functional relationship with the A site remains poorly characterized. Using molecular dynamics (MD) simulations, we examined the interplay between the A site and CAR site, revealing sequence-dependent modulation of H-bonding and {pi}-stacking interactions within and between the two sites. These findings highlight the interplay between the A site and CAR site, suggesting a structural and functional connection between these two regions of the ribosome that may contribute to mRNA sequence-specific tuning of translation elongation. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=91 SRC="FIGDIR/small/714784v1_ufig1.gif" ALT="Figure 1"> View larger version (22K): org.highwire.dtl.DTLVardef@1d783d3org.highwire.dtl.DTLVardef@f9cd8org.highwire.dtl.DTLVardef@102667corg.highwire.dtl.DTLVardef@967c56_HPS_FORMAT_FIGEXP M_FIG C_FIG

5
A platform-agnostic evaluation of non-formalin fixed single cell RNA technologies

Haukenfrers, E. J.; Jain, V.; Arvai, S. F.; Patel, K. K.; Gregory, S. G.; Abramson, K. R.; Swain Lenz, D.

2026-01-28 genomics 10.64898/2026.01.27.702057 medRxiv
Top 0.1%
14.9%
Show abstract

The rapidly advancing field of single cell RNA sequencing (scRNAseq) offers numerous options for transcriptome profiling. However, questions remain as to which chemistry is appropriate for individual experimental goals. Preceding single cell benchmarking studies included previously available methods and involved a mixture of fresh and fixed samples or probe- and non-probe-based capture methods. However, the inherent differences in sample types and methods limited the conclusions to be drawn between analogous technologies. Here, we present a novel, systematic comparison of four widely used non-probe-based, non-formalin fixed scRNAseq assays. We build upon past comparisons that used varied computational pipelines by applying both platform-specific and agnostic cell calling algorithms for an unbiased comparison of biological and technical replicates from healthy human PBMCs. Our approach evaluates 10x Genomics, Parse Biosciences (QIAGEN), Scale Biosciences (10x Genomics), and Illumina scRNAseq assays to examine data based on accuracy, sensitivity, precision, power, and efficiency using agnostic and platform-specific cell calling. While metrics vary between assays, there are clear advantages and limitations to each technology, including experimental time and financial costs. In summary, our study highlights the need for carefully considered project design of non-formalin fixed scRNAseq assays, which is determined by many factors and dependent on an investigators specific research aims and available resources. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=198 SRC="FIGDIR/small/702057v1_ufig1.gif" ALT="Figure 1"> View larger version (46K): org.highwire.dtl.DTLVardef@19d9a17org.highwire.dtl.DTLVardef@1ef650aorg.highwire.dtl.DTLVardef@1d27484org.highwire.dtl.DTLVardef@1df8c2c_HPS_FORMAT_FIGEXP M_FIG C_FIG

6
Systems-level analysis identifies IRF6 as an inhibitor of epithelial-mesenchymal transition

Subbalakshmi, A. R.; Agrawal, A.; Debnath, S.; Hari, K.; Sahoo, S.; Somarelli, J.; Jolly, M. K.

2026-02-01 systems biology 10.64898/2026.01.31.702311 medRxiv
Top 0.1%
14.5%
Show abstract

BackgroundEpithelial-mesenchymal transition (EMT) and its reverse process Mesenchymal-Epithelial Transition (MET) are crucial during metastasis and therapy resistance. While the dynamics and master regulators of EMT are well-studied, the transcription factors that can prevent EMT or promote MET are relatively less understood. ResultsHere, by integrating bulk and spatial transcriptomic data analysis from cell lines and patient samples, with mechanism-based dynamical modelling, we identify IRF6 as a factor that strongly associates with an epithelial phenotype and is often inhibited during EMT. In vitro experiments in multiple cancer cell lines demonstrate the progression to a mesenchymal phenotype upon IRF6 knock-down, suggesting a role as an inhibitor of EMT. Finally, we observe that IRF6 expression levels correlates with worse patient survival in a subset of solid tumour types. ConclusionOur integrated computational-experimental systems-level analysis suggests that IRF6 is frequently downregulated during EMT and can also prevent the progression towards a complete EMT, underscoring its role as an MET stabilizing factor.

7
An Explainable Machine Learning Approach to study the positional significance of histone post-translational modifications in gene regulation

Ramachandran, S.; Ramakrishnan, N.

2026-02-02 bioinformatics 10.64898/2026.01.30.702742 medRxiv
Top 0.1%
14.1%
Show abstract

Epigenetic mechanisms regulate gene-expression by altering the structure of the chromatin without modifying the underlying DNA sequence. Histone post-translational modifications (PTMs) are critical epigenetic signals that influence transcriptional activity, promoting or repressing gene-expression.Understanding the impact of individual PTMs and the combinatorial effects is essential to deciphering gene regulatory mechanisms.In this study,we analyzed the ChIP-seq data for 26 PTMs in yeast, examining the PTM intensities gene-wise from positions-3 to 8 in each gene.Using XGBoost classifiers, we predicted gene transcription rates and identified key histone modifications and nucleosomal positions that are critical in gene-expression using explainability measures (such as SHAP). Our study provides a comprehensive insight into the histone modifications, their positions and their combinations that are most critical in gene regulation in yeast.The proposed explainable Machine Learning models can be easily extended to other model organisms to provide meaningful insights into gene regulation by epigenetic mechanisms.

8
Impact of the N-glycosylation on full-length IgG2 and IgG4 antibodies: a comparative study using molecular dynamics simulations.

LEON FOUN LIN, R.; Bellaiche, A.; Diharce, J.; Etchebest, C.

2026-04-16 bioinformatics 10.64898/2026.04.14.718417 medRxiv
Top 0.1%
12.4%
Show abstract

Like other proteins, monoclonal antibodies - important biodrugs- are subject to post translational modifications, especially the N-glycosylations. However, the effect of the N-glycosylations remains poorly studied and atomistic details about their influence are rarely available. Moreover, the few existing studies focus on the prevalent immunoglobulin G1. To go further in the understanding of the impact of glycosylations, we have carried out a comparative exploration of the effect of N-glycosylations on two different classes of antibodies, namely Mab231, an IgG2 and the pembrolizumab, an IgG4. The two antibodies differ by their sequences, their length, their 3D structure but also by the location and composition of the glycans. In the present work, detailed and important information were gained through molecular dynamics simulations where both monoclonal antibodies were studied without and with the presence of their glycans. The results of 1.5 {micro}s of sampling for each system show that glycosylation does not drastically alter the overall conformational landscape of either antibody, whatever the metrics considered. However, it measurably modulates local flexibility, inter-domain correlated motions, and the relative orientation of the Fab arms with respect to the Fc domain, with statistically significant shifts in key geometric descriptors. Importantly, contact analysis reveals that glycan interactions extend beyond the Fc region to reach Fab residues. The allosteric network calculations demonstrate that the influence of Fc-bound glycans propagates even until the Fab framework regions in both mAbs, which could impact the antigen binding. The nature and magnitude of these effects are subclass-dependent, reflecting differences in glycan composition, hinge architecture, and three-dimensional organization Our findings challenge the prevailing view that Fc glycosylation uniformly promotes CH2 domain opening. More importantly, it underscores the necessity of considering full-length structures and IgG subclass diversity in glyco-engineering strategies.

9
An AI-Assisted Workflow for Reconstruction, Extension, and Calibration of Quantitative Systems Pharmacology Models.

Goryanin, I.; Checkley, S.; Demin, O.; Goryanin, I.

2026-04-07 systems biology 10.64898/2026.04.05.716273 medRxiv
Top 0.1%
12.4%
Show abstract

AbstractsO_ST_ABSBackgroundC_ST_ABSQuantitative systems pharmacology (QSP) models provide mechanistic insight into drug response but are limited by labor-intensive, expert-driven workflows. We developed an AI-assisted QSP (AI-QSP) framework that integrates large language models (LLMs) with SBML-based modeling to enable automated reconstruction, extension, and calibration of mechanistic models. MethodsThe framework was applied to a published CAR-T QSP model. The model was reconstructed in SBML and extended via LLM-guided prompts to incorporate key resistance mechanisms: T-cell exhaustion, PD-1/PD-L1 checkpoint regulation, and tumor antigen escape. Model development followed an iterative expert-in-the-loop workflow. The resulting model (21 reactions, 9 species) was calibrated to synthetic benchmark data using 19-parameter optimization. Model credibility was assessed using ASME V&V 40 and ICH M15 principles, including global sensitivity and profile-likelihood analyses. ResultsThe calibrated model reproduced benchmark dynamics with high accuracy (mean log-RMSE = 0.132). Sensitivity analysis identified CAR-T killing and bystander cytotoxicity as dominant drivers of tumor response. Profile-likelihood analysis showed 71% of parameters were practically identifiable, with remaining parameters prioritised for future data-driven refinement. ConclusionsAI-assisted QSP modeling enables reproducible, scalable model reconstruction and evolution while maintaining mechanistic transparency and regulatory alignment. This framework provides a foundation for accelerating model-informed drug development in cell and gene therapies.

10
Rational Design of Selective IL-2-based Activators for CAR T Cells Using AlphaFold3 and Physics-Informed Machine Learning

Dahmani, L. Z.; Banerjee, A.

2026-03-12 bioinformatics 10.64898/2026.03.10.710391 medRxiv
Top 0.1%
12.3%
Show abstract

Recombinant human Interleukin-2 (rhIL-2, Aldesleukin) is used in immunotherapy for metastatic melanoma and renal cell carcinoma. Low-dose IL-2 has been investigated for administration after adoptive T cell transfer to enhance CAR T expansion and sustain effector function. However, systemic IL-2 can cause severe toxicities and promote expansion of regulatory T cells (Tregs). Previous attempts at mitigating cytokine-mediated side effects involved isolating CAR T cell signaling from endogenous immune responses by developing IL-2/IL-2R{beta} based selective ligand-receptors systems. Expressing these variant orthogonal (ortho)IL2-R{beta} receptors in CAR T cells and supplying variant orthoIL-2, was shown to dramatically improve selectivity in CAR T cell expansion and anti-tumoral potency in a leukemia mouse model. This study describes the computational design of synthetic orthogonal cytokine receptor-ligand systems based on the scaffolds of the human canonical IL-2 and IL-2R{beta}. Leveraging state-of-the-art AlphaFold3 (AF3) structure prediction capabilities and a physics-informed constrained sequence generator (CSG), the pipeline generates, filters and ranks sets of putative orthoIL-2/orthoIL-2R{beta} mutant designs. Variants displaying minimal predicted off-target interactions and enhanced in target contacts are prioritized for structural modelling. Top designs showed outstanding AF3 structural and interfacial quality metrics ipTM and pTM, with averages between cognate pairs of 0.724{+/-}0.05 and 0.770{+/-}0.042, respectively. All in-silico hits showed ipTM <0.5 for non-cognates, indicating a good likelihood of orthogonality. Additionally, putative hits showed high levels of predicted structural fidelity to wild-type (WT) human IL-2/IL-2R{beta} (PDB: 2ERJ), with an average structural root-mean-square deviation (RMSD) of 0.843{+/-}0.375 [A]. These mutants incorporated 7-26 interfacial mutations derived from multiple interface selection strategies. Altogether, the results support the putative foldability and selective affinity of top-ranking mutants displaying metrics close-to or within experimental reference range. Finally, strengths and limitations are discussed, alongside the experimental implications of coupling a constrained protein design pipeline to the discovery and validation of selective binders based on naturally occurring scaffolds.

11
Genomic Evolution of SARS-CoV-2 Delta Variants Pre- and Post-Omicron Emergence using Alignment-free Machine Learning models

Sankar, S.; Anandharaman, K.; Selvam, P.; Jayaraman, A.; Jayakumar, D.; Sivadoss, R.; Esaki Muthu, S.; Velu, V.; Larsson, M.; Balakrishnan, P.

2026-02-23 genomics 10.64898/2026.02.20.706927 medRxiv
Top 0.1%
12.1%
Show abstract

The SARS-CoV-2 Delta variant (B.1.617.2), initially classified as a variant of concern due to its enhanced transmissibility and vaccine-escape mutations, underwent further genomic changes following the emergence of the Omicron variant (B.1.1.529). This study investigates the genomic differences in Delta variant spike gene sequences collected before and after the emergence of Omicron. A total of 190 sequences were analyzed using an alignment-free approach incorporating k-mer-based feature extraction and machine learning models, including convolutional neural networks (CNN), K-means clustering, and random forest classification. The random forest model achieved 93% accuracy, with significant F1 scores, effectively distinguishing the two Delta variant groups. Comparative analysis revealed 157 persistent mutations and four vanished mutations in the post-Omicron group. Cluster analysis showed notable shifts, indicating stable yet evolving genomic patterns over time. The study demonstrates the advantage of alignment-free methods in detecting subtle sequence variations that alignment-based approaches may overlook. These findings enhance our understanding of SARS-CoV-2 evolution and provide a framework for identifying key genomic signatures relevant to public health. The methodology and insights gained offer potential applications in variant surveillance, vaccine design, and viral evolutionary studies, supporting preparedness for future SARS-CoV-2 variant emergence.

12
A Proof-of-Concept Study of a Clinical Decision Support System for Vancomycin Therapeutic Monitoring

Hassan, F.; Lou, J. Y.; Lim, C. T.; Ong, W. Q.; Rumaizi, N. N.

2026-03-02 pharmacology and therapeutics 10.64898/2026.02.22.26346368 medRxiv
Top 0.1%
10.9%
Show abstract

Artificial intelligence (AI), particularly large language models (LLMs), is increasingly explored in healthcare, yet its real-world usability and safety in high-risk clinical pharmacy tasks remain uncertain. Vancomycin therapeutic drug monitoring (TDM), which requires precise pharmacokinetic calculations and context-sensitive interpretation within a narrow therapeutic window, provides a stringent test case for AI-assisted decision support. This proof-of-concept study developed and evaluated a hybrid clinical decision support system (TDM-AID) integrating a validated deterministic pharmacokinetic calculation engine, GPT-4o-based structured clinical interpretation, and retrieval-augmented guideline support. Thirty retrospective adult vancomycin TDM cases were assessed using a weighted six-domain rubric covering pharmacokinetic accuracy, AUC estimation, prospective prediction, timing recommendations, clinical judgment, and documentation quality. Two independent expert pharmacists evaluated system outputs against benchmark consultations. The overall median performance was 78% (IQR 12%), classified as Acceptable, and 73% (IQR 14%) when deterministic calculations were excluded. Foundational pharmacokinetic calculations achieved 100% accuracy. Clinical judgment demonstrated Good performance (83%), whereas prospective prediction was limited (58%), and timing recommendations were absent in all cases. Safety violations occurred in 17% of cases, including dose recommendations exceeding 4 g/day. Inter-rater reliability was good (ICC 0.87). These findings suggest that hybrid AI-driven decision support is technically feasible and usable as a pharmacist-augmenting draft generator; however, limitations in predictive reasoning, timing logistics, and safety enforcement necessitate deterministic safeguards and mandatory expert oversight before clinical implementation.

13
AQuA2-Cloud: a web platform for fluorescence bioimaging activity analysis

Bright, M.; Mi, X.; Duarte, D.; Carey, E.; Lyu, B.; Wang, Y.; Nimmerjahn, A.; Yu, G.

2026-03-10 bioinformatics 10.64898/2026.03.06.709938 medRxiv
Top 0.1%
10.7%
Show abstract

BackgroundAdvanced biological imaging analysis platforms such as Activity Quantification and Analysis (AQuA2) enable accurate spatiotemporal activity analysis across diverse cell populations within many species. These tools are increasingly important for investigating cellular signaling dynamics and behavior. However, despite advances in the accuracy and species capability of AQuA2, it remains computationally demanding for analysis of long time-series datasets and requires all users to maintain a MATLAB license, which may limit accessibility and large-scale deployment. ResultsTo address these limitations, we have designed and made available AQuA2-Cloud, a portable software stack and web platform developed as an improvement and further evolution of AQuA2. This container-deployable system permits multi-user cloud-based high accuracy activity quantification with intuitive workflows, export of analysis data and project files, and comparable processing times. The platform offers integrated features such as in-browser analysis control interfaces, asynchronous program state control, multiple users and user management, support for unreliable connections, file uploading and downloading via web browsers and File Transfer Protocol, and centralized organization of analysis output. ConclusionAQuA2-Cloud constitutes a cloud-native solution for laboratories or research groups seeking to centralize analysis of spatiotemporal biological imaging datasets while reducing software installation and licensing barriers for end users. The platform enables researchers with minimal technical expertise to perform advanced bioimaging analysis through standard web browsers while maintaining the analytical capabilities of AQuA2. AQuA2-Cloud source code, deployment procedures, and documentation are freely available at (https://github.com/yu-lab-vt/AQuA2-Cloud).

14
Assessing the impact of parental linear gene normalization on the performance of statistical models for circular RNA differential expression analysis

Qorri, E.; Varga, V.; Priskin, K.; Latinovics, D.; Takacs, B.; Pekker, E.; Jaksa, G.; Csanyi, B.; Torday, L.; Bassam, A.; Kahan, Z.; Pinter, L.; Haracska, L.

2026-03-09 bioinformatics 10.64898/2026.03.06.710045 medRxiv
Top 0.1%
10.3%
Show abstract

BackgroundCircular RNAs (circRNAs) emerged as promising non-invasive cancer biomarkers due to their stability, abundance in body fluids, and regulatory potential. However, circRNA differential expression analysis (DEA) remains challenging, largely owing to lack of consensus on important preprocessing strategies such as filtering and normalization. While well-established bulk RNA-sequencing frameworks are commonly applied to circRNA data, newer approaches such as CIRI-DE (part of CIRI3 suite) integrate both linear and circular transcript information to improve detection. Despite developments, an assessment of these integrative strategies is lacking, and the critical impact of filtering on DEA model performance has not been comprehensively evaluated. ResultsIn this study, we evaluated the impact of multiple normalization and filtering strategies on circRNA DEA using five experimental datasets, including two in-house blood platelet sets and semi-parametric simulated in silico datasets. Our results emphasize the importance of selecting an appropriate filtering threshold, as overly lenient filtering substantially reduced model performance across datasets. We found edgeRs filterByExpr() strategy particularly effective in handling zero counts in circRNA data, while also generating the most reliable results across most datasets. Furthermore, by incorporating linear and circular information as described in CIRI-DE, most methods identified a higher number of differentially expressed (DE) circRNAs compared to circular counts alone. Notably, circRNAs identified by both CIRI-DE and the modified bulk RNA-sequencing pipelines showed substantial overlap. ConclusionOur findings demonstrate that automated filtering combined with linear-aware normalization significantly enhances the sensitivity and reproducibility of circRNA DEA, providing a standardized framework for more reliable biomarker discovery in transcriptomic research.

15
Defining the DNA Binding Specificity of GRHL2

Messa, P. E.; Warren, C. L.; Nicol, N. R.; Pearson, K. S.; Peters, J. P.; Fowler, A. M.; Alarid, E. T.; Ozers, M. S.

2026-04-18 biochemistry 10.64898/2026.04.16.719077 medRxiv
Top 0.1%
10.2%
Show abstract

Grainyhead-like 2 (GRHL2) is an epithelial transcription factor with context-dependent regulatory roles, yet the sequence rules governing its DNA recognition remain incompletely defined. In this study, a high-density genomic Specificity and Affinity for Protein (SNAP) DNA-binding array containing 772,732 tiled probes derived from GRHL2 ChIP-seq regions was used to resolve GRHL2 binding specificity at 6 base pair resolution across genomic sequences. From high-affinity probes, de novo motif analysis recovered the canonical 5-AACCGGTT-3 motif. Sequence specificity landscapes revealed a stepwise reduction in binding as mismatches were introduced, with the strongest effects at the C (position 3) and G (position 6) within the motif, greater tolerance at the central CG dinucleotide, and intermediate tolerance at the A/T bases at the motif edges. This analysis also demonstrated the influence of nearby flanking sequences. Extended motif and spacing analyses indicated dimeric binding at paired motifs, with periodic helical spacing consistent with interactions on the same face of the DNA helix. Integration of SNAP array binding with ChIP-seq data distinguished direct, motif-encoded GRHL2 occupancy from indirect, cofactor-mediated recruitment at genomic sites. These results define the sequence specificity of GRHL2 interactions with variations in the DNA consensus motif and flanking sequences within an endogenous genomic context. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=77 SRC="FIGDIR/small/719077v1_ufig1.gif" ALT="Figure 1"> View larger version (21K): org.highwire.dtl.DTLVardef@1a28904org.highwire.dtl.DTLVardef@1d197aforg.highwire.dtl.DTLVardef@13d9e97org.highwire.dtl.DTLVardef@76d55f_HPS_FORMAT_FIGEXP M_FIG C_FIG

16
Rapid and Interpretable AMR Diagnostics via Genomics and Cell Painting using Differential Geometry-based Directed-Simplicial Neural Networks on Multimodal Data

Thakur, L. S.; Mahajan, S. S.; Bharj, G.; Ding, M.; Dekanoidze, N.; Shrivastava, V.

2026-03-12 microbiology 10.64898/2026.03.11.711128 medRxiv
Top 0.2%
10.1%
Show abstract

Antimicrobial resistance (AMR) remains a critical global health challenge, particularly in high-prevalence regions such as India, where rapid and interpretable diagnostic tools are urgently needed. To address this challenge, we present a computational framework for AMR prediction that integrates genomic and cellular phenotypic data using an in-house developed differential geometry-based Directed Simplicial Neural Network (Dg-Dir-SNNs) applied to multimodal datasets. Using this framework, we analyzed 384 clinically relevant AMR isolates, including Escherichia coli and Klebsiella pneumoniae, integrating 256 genomic k-mer features with 503 cellular morphology descriptors derived from high-content Cell Painting assays. The Dg-Dir-SNNs model constructs an inferred-causal network of top-ranked biomarker-driving features, predicting potential directional dependencies among genomic motifs and phenotypic features. Network analysis identified kmer_TATG as the top-ranked driver associated with predicted resistance, with a local neighborhood including other genomic motifs (kmer_TTTT, kmer_CGTG, kmer_TCAC, kmer_CGTA, kmer_GAAA, kmer_TAAA, kmer_TACA, kmer_TGTG, kmer_TGAG, kmer_AAAA) and a key morphological feature (Cells_correlation_ER_Brightfield). These relationships suggest potential mechanistic associations in which specific genomic motifs may influence cellular phenotypes linked to antimicrobial resistance. Although not yet clinically deployed, this approach demonstrates the potential of multimodal AI-driven modeling for rapid in silico AMR prediction. By providing interpretable, biologically grounded insights, the framework may support future diagnostic development, targeted surveillance strategies, and experimental validation in high-resistance healthcare settings.

17
Modeling and dissecting bidirectional feedback in gene-metabolite systems using the CausalFlux method

Subramanian, N.; Kumar, S. P.; Rengaswamy, R.; Bhatt, N. P.; Narayanan, M.

2026-04-13 systems biology 10.64898/2026.04.10.717623 medRxiv
Top 0.2%
10.1%
Show abstract

Predicting cellular behaviors, a central task in systems biology and metabolic engineering, can be enhanced through integrative modeling of processes such as gene regulation and metabolism. Information flow from gene regulation (modeled via a gene regulatory network) to metabolism (modeled via a genome-scale metabolic model) is well-studied, but the reciprocal regulation of genes by metabolites is less explored. We introduce CausalFlux, a method that models bidirectional feedback between genes and metabolites, in order to predict steady-state reaction fluxes under wild-type (WT) or perturbed (e.g., gene knockout/KO) conditions. CausalFlux does so by iteratively performing causal surgery on a Bayesian gene regulatory network and constraint-based analysis of a coupled metabolic model. CausalFlux enabled us to assess the impact of two-way feedback in several testbed models and real-world biological systems by comparing its predictions to those of TRIMER, a state-of-the-art model of gene-to-metabolite one-way feedback. Incorporating bidirectional feedback, as in CausalFlux, improved the Spearman correlation between actual and predicted fluxes in 92% of the 39 distinct simulation conditions relative to TRIMER. For predicting growth/no-growth phenotype following single-gene KOs in E. coli, CausalFlux achieved a balanced accuracy of 0.79 in identifying essential genes, and TRIMER achieved 0.71 for the same task, again highlighting the importance of modeling two-way feedback. In ablation studies that further dissect the role of specific metabolite[-&gt;]gene feedback edges in E. coli, the F1 scores of gene essentiality predictions decreased by 7.5% and 13% upon ablation of feedback edges from any metabolite to the crp gene and the 10 metabolic feedback genes with the highest influence on the KO genes, respectively. Finally, we highlight the application of CausalFlux to predict the essentiality of several hundred genes under different media conditions. Overall, our findings show that CausalFlux can crucially utilize information on feedback metabolites to predict trends in reaction fluxes and qualitative (growth/no-growth) outcomes; thereby encouraging future systems modeling efforts to carefully incorporate not only gene-to-metabolite but also metabolite-to-gene interactions. AvailabilityCode pertaining to the CausalFlux method, and its benchmarking and application is publicly available at: https://github.com/BIRDSgroup/CausalFlux. Author summaryThe myriad processes within a living cell, such as gene regulation or metabolism, are tightly interconnected. Modeling these interconnected processes can offer a deeper mechanistic understanding of cellular behaviors, as well as guide efforts that engineer the metabolic output of a cell. In this work, we develop a novel integrated model of gene regulation and metabolism that incorporates bidirectional feedback between these two processes, via the concept of metabolite-induced causal surgery on a gene regulatory network and gene-induced constraints on the fluxes of metabolic reactions. Our model, which we call CausalFlux, represents an advance over most existing models that capture just the one-way gene-to-metabolism feedback (i.e., genes coding for enzymes that control metabolic reactions). Our CausalFlux methodology opens up an unique opportunity to quantify the impact of two-way feedback in gene-metabolite systems, via comparison of CausalFluxs predictions to those of TRIMER, a published model incorporating one-way feedback alone. For predicting reaction fluxes in testbed models and essential genes in E. coli, quantitative comparison of the performance of CausalFlux vs. TRIMER showed that accounting for two-way feedback leads to more accurate and biologically meaningful predictions. CausalFlux also enabled us to quantify the effect of two-way feedback by comparing prediction performance before and after ablation of certain feedback edges from metabolites to genes. Overall, our findings highlight the importance of modeling gene regulation and metabolism as two-way interconnected systems within a living cell, and encourage future works to incorporate gene{leftrightarrow}metabolite feedback into their analyses.

18
User-driven development and evaluation of an agentic framework for analysis of large pathway diagrams

Corradi, M.; Djidrovski, I.; Ladeira, L.; Staumont, B.; Verhoeven, A.; Sanz Serrano, J.; Rougny, A.; Vaez, A.; Hemedan, A.; Mazein, A.; Niarakis, A.; de Carvalho e Silva, A.; Auffray, C.; Wilighagen, E.; Kuchovska, E.; Schreiber, F.; Balaur, I.; Calzone, L.; Matthews, L.; Veschini, L.; Gillespie, M. E.; Kutmon, M.; Koenig, M.; van Welzen, M.; Hiroi, N.; Lopata, O.; Klemmer, P.; Overall, R.; Hofer, T.; Satagopam, V.; Schneider, R.; Teunis, M.; Geris, L.; Ostaszewski, M.

2026-03-12 bioinformatics 10.64898/2026.03.10.710813 medRxiv
Top 0.2%
10.0%
Show abstract

As biomedical knowledge keeps growing, resources storing available information multiply and grow in size and complexity. Such resources can be in the format of molecular interaction maps, which represent cellular and molecular processes under normal or pathological conditions. However, these maps can be complex and hard to navigate, especially to novice users. Large Language Models (LLMs), particularly in the form of agentic frameworks, have emerged as a promising technology to support this exploration. In this article, we describe a user-driven process of prototyping, development, and user testing of Llemy, an LLM-based system for exploring these molecular interaction maps. By involving domain experts from the very first prototyping in the form of a hackathon and collecting both fine-grained and general feedback on more refined versions, we were able to evaluate the perceived utility and quality of the developed system, in particular for summarising maps and pathways, as well as prioritise the development of future features. We recommend continued user-driven development and benchmarking to keep the community engaged. This will also facilitate the transition towards open-weight LLMs to support the needs of the open research environment in an ever-changing technology landscape.

19
Gender-Specific Osteoporosis Risk Prediction Using Longitudinal Clinical Data and Machine Learning

Tripathy, S.; Saripalli, L.; Berry, K.; Jayasuriya, A. C.; Kaur, D.; Syed, F.

2026-02-17 orthopedics 10.64898/2026.02.13.26346244 medRxiv
Top 0.2%
10.0%
Show abstract

Osteoporosis is a silent yet debilitating disease that often remains undetected until fractures occur. While early prediction is crucial, most studies combine male and female datasets to train a single model, introducing bias since osteoporosis risk and progression differ by gender. This study aims to develop gender-specific machine learning models that leverage longitudinal data to predict osteoporosis risk, providing tailored insights for men and women. Data were obtained from two large longitudinal cohorts: the Study of Osteoporotic Fractures (SOF) for women and the Osteoporotic Fractures in Men Study (MrOS) for men. Multiple ML algorithms were trained and evaluated for each sex, with model performance assessed using the area under the receiver operating characteristic curve (AUC-ROC). Among the tested models, the XGBoost model demonstrated the best performance for women, achieving an AUC-ROC of 0.93 using SOF data. For men, the Random Forest model achieved an AUC-ROC of 0.89 using MrOS data. Feature importance analysis identified sex-specific osteoporosis risk factors, underscoring the need for tailored prediction and management. By revealing male and female risk factors and reducing bias from combined datasets, the work advances personalized care and supports earlier, effective clinical intervention to prevent fractures and improve health outcomes.

20
AI-guided design of candidate BMPR1A-binding peptides for cartilage regeneration: a multi-tool computational benchmarking study

Ahmadov, A.; Ahmadov, O.

2026-03-25 bioinformatics 10.64898/2026.03.22.713519 medRxiv
Top 0.2%
9.2%
Show abstract

Bone morphogenetic protein receptor type IA (BMPR1A) is a key mediator of chondrogenesis and a validated therapeutic target for cartilage repair, yet existing BMP mimetic peptides suffer from low potency and the full-length protein (rhBMP-2) carries significant safety risks. Generative AI tools for protein design can now produce de novo peptide binders, but none have been applied to cartilage regeneration targets. Here, we benchmarked four architecturally distinct AI tools--RFdiffusion, BindCraft, PepMLM, and RFpeptides--to design candidate BMPR1A-binding peptides. We generated 192 candidates alongside 98 negative controls (290 total) and evaluated all complexes using AlphaFold 3 structure prediction, dual physics-based energy scoring (PyRosetta and FoldX), and contact recapitulation against the crystallographic BMP-2:BMPR1A interface (PDB: 1REW). A four-metric composite ranking identified a 15-residue PepMLM design (pepmlm_L15_0026) as the top candidate, combining favorable binding energy (PyRosetta dGseparated = -45.9 REU; FoldX {Delta}G = -19.4 kcal/mol) with the highest contact recapitulation among top-ranked peptides (11/30 gold-standard interface residues). Designed candidates significantly outperformed controls on ipTM (p = 0.002) and FoldX {Delta}G (p < 0.001). BindCraft candidates achieved the highest structural confidence (ipTM up to 0.81) but exhibited moderate contact recapitulation (mean 0.224), consistent with the computational hypothesis that they may engage alternative BMPR1A binding surfaces rather than the native BMP-2 interface. Physicochemical filtering yielded a shortlist of 54 candidates across all four tools. These results establish a reproducible computational framework for AI-guided peptide design targeting cartilage regeneration and identify specific candidates for future experimental validation via binding assays and chondrocyte differentiation studies. Author summaryDamaged cartilage has limited capacity to heal, and current biological therapies based on bone morphogenetic protein 2 (BMP-2) carry serious safety concerns including ectopic bone formation and inflammation. Short peptides that mimic BMP-2s interaction with its receptor BMPR1A could offer a safer, more targeted alternative, but designing such peptides from scratch is challenging. We used four different artificial intelligence tools--each employing a distinct computational strategy--to generate 192 candidate peptides designed to bind BMPR1A. We then evaluated all candidates using multiple independent computational methods to assess binding quality, energy favorability, and whether each peptide targets the correct site on the receptor. Our analysis identified a shortlist of 54 promising candidates, with a 15-residue peptide from the language model-based tool PepMLM emerging as the top-ranked design. We also found evidence that one tool (BindCraft) may produce peptides that bind BMPR1A at sites different from the natural BMP-2 interface, highlighting the importance of validating not just whether a peptide binds, but where it binds. Our computational framework and candidate peptides provide a foundation for future laboratory testing toward cartilage repair therapies.